Genome resequencing reveals independent domestication and breeding improvement of naked oat

Abstract As an important cereal crop, common oat, has attracted more and more attention due to its healthy nutritional components and bioactive compounds. Here, high-depth resequencing of 115 oat accessions and closely related hexaploid species worldwide was performed. Based on genetic diversity and linkage disequilibrium analysis, it was found that hulled oat (Avena sativa) experienced a more severe bottleneck than naked oat (Avena sativa var. nuda). Combined with the divergence time of ∼51,200 years ago, the previous speculation that naked oat was a variant of hulled oat was rejected. It was found that the common segments that hulled oat introgressed to naked oat cultivars contained 444 genes, mainly enriched in photosynthetic efficiency-related pathways. Selective sweeps during environmental adaptation and breeding improvement were identified in the naked oat genome. Candidate genes associated with smut resistance and the days to maturity phenotype were also identified. Our study provides genomic resources and new insights into naked oat domestication and breeding.

As an important cereal crop, common oat has attracted more and more attention due to its 28 healthy nutritional components and bioactive compounds. Here, we performed high-depth 29 resequencing of 115 oat accessions and closely related hexaploid species worldwide. Based on 30 genetic diversity and linkage disequilibrium analysis, we found that hulled oat experienced a 31 more severe bottleneck than naked oat. Combined with the divergence time of ~51,200 years 32 ago, we rejected the previous speculation that naked oat was a variant of hulled oat. We found 33 that the common segments that hulled oat introgressed to naked oat cultivars contained 444 34 genes, mainly enriched in photosynthetic efficiency-related pathways. We identified selective 35 sweeps during environmental adaptation and breeding improvement in the naked oat genome. 36 We also identified candidate genes associated with smut resistance and the days to maturity 37 phenotype. Our study provides genomic resources and new insights into naked oat 38 domestication and breeding. 39

Introduction 43
Common oat ranks seventh in production among global cereal crops 44 (http://www.fao.org/faostat/en/, accessed May 2022). It is one of the most important crops in 45 several countries and is widely used as human food and animal feed [1]. Oats are high in protein, 46 rich in polyunsaturated fatty acids, and have a low carbon footprint [2]. In recent years, oat has 47 attracted more and more attention as healthy food due to its rich content of various bioactive 48 compounds, which can reduce the risk of cardiovascular diseases (CVD), type 2 diabetes 49 mellitus (T2DM), gastrointestinal disorders, and cancer [3]. Oat is highly adaptable to a wide 50 range of climates. Oat can be widely planted and have high yields in the harsh marginal 51 environment where other major cereal crops, such as rice and corn, cannot be grown [4]. In 52 China, naked oat landraces are distributed from the warm and humid Yunnan-Guizhou region 53 to the cold and dry Shanxi-Gansu region. 54 Common oat is typically classified into two types according to the morphology of the seeds: 55 hulled and naked. Hulled oats are the widely known oats grown all over the world, while naked 56 oat is grown mainly in China [5,6], and the most extensive germplasm collection for naked oat 57 is maintained at the National Germplasm Bank of China (NGBC, http://www.cgris.net). Hulled 58 oat has a caryopsis tightly surrounded by a thick, lignin-rich hull that remains attached to the 59 mature grain throughout threshing and cleaning (Figure 1a). In contrast, naked oat is 60 characterized by papery, free-threshing hulls that are mostly lost during threshing and cleaning 61 [7]. Besides the free-threshing attribute, naked oat differs from hulled oat by having a 62 multiflorous habit and elongated rachilla segments in the mature panicle [8]. 63 Common oat is usually considered a secondary crop, i.e., derived from a weed of the primary 64 cereal domesticates, wheat and barley [9]. Current thinking is that common oat was probably 65 domesticated in central or northern Europe ~3000 years ago from a weedy hexaploid progenitor 66 that may have been used as a forage crop [10]. There is as yet no consensus as to the origin of 67 naked oat. It has been proposed that the naked oat is a separate species from the hulled oat, 68 named Avena nuda L. [11]; it has also been proposed that the naked oat arose as a mutant of 69 the domesticated hulled oat. The mutant theory suggested that naked oat may have originated 70 from hulled oat, potentially after hulled oats reached China from its main center of diversity in 71 southwest Asia [2,12]. However, direct molecular evidence is scarce to be reported. It has been 72 declared the genetic diversity of naked oat is less than that of hulled oat, which would support 73 a bottleneck effect from a mutation event to domestication [7]. However, that study was based 74 and ONL/OH is similar (0.085 vs. 0.082). This may be related to the breeding history of hulled-123 naked hybridization in ONC [16]. 124 We conducted a runs of homozygosity (ROH) analysis which indicated the following order for 125 the degree of inbreeding (highest first): ONC, OH, ONL, and the closely related hexaploid 126 species (OG) (Figure 2c). We detected strong linkage disequilibrium (LD) in the oat genome 127 ( Figure 2d) and noted that the r 2 of LD decay was greater than 0.4 for windows larger than 128 1Mb in all populations. This level of LD was much higher than in other crops such as rice [17], 129 maize [18,19], and sorghum [20,21]. Because cross-breeding would lead to increased LD [22], 130 the LD of ONC was much higher than that of OH and ONL. 131

Differentiation of hulled oat and naked oat 132
Although early studies proposed that naked oat was a separate species  We mentioned above that the genetic diversity of ONL was higher than that of OH. To avoid 144 the effects of gene flow and sample size, we performed 100 random samplings (n=19, 145 consistent with the sample size of OHc) of ONLc to calculate genetic diversity. We found that 146 the genetic diversity of ONLc was still higher than that of OHc (1.08e -3 vs. 0.90e -3 ). Although 147 the mapping rates of ONL and OH were high and not significantly different (99.04% vs. 148 99.92%, Figure 3d), the mapping rates of ONL had greater variation (standard deviation: 1.91 149 vs. 0.02). The PCA results showed that ONL occupied the largest variable space (Figure 3e), 150 consistent with the genetic diversity finds. These contradicted the speculation that the naked 151 oat is a variant of the hulled oat, in which the naked oat would experience an additional 152 bottleneck, resulting in lower genetic diversity than the hulled oat. Moreover, the hulled oat 153 had a stronger short-distance LD (Figure 2d), indicating that the hulled oat experienced a more 154 severe bottleneck than the naked oat. Combining hulled oat and naked oat had a deep split on 155 the phylogenetic tree, these lines of evidence argue against the idea that naked oats originated 156 as a variant of hulled oats. 157 We also modeled the differentiation time of hulled oat and naked oat. To calculate the 158 divergence time between hulled oat and naked oat, we extracted all high-quality 4dtv loci 159 genotypes in the whole genome. Through these neutral evolutionary sequences and using a  Table S4 and S5), suggesting that hulled oat may have contributed to yield 186 improvements for naked oat cultivars by altering photosynthetic efficiency. 187 Among these 444 introgressed genes, many predicted gene products could be possibly related 188 to differential yield, including, for example, a VIN3-like protein (Pepsico2_Contig10032), 189  (Table S6), a GDSL esterase/lipase (Pepsico2_Contig5911), bZIP 193 transcription factors (Table S6), and light-harvesting chlorophyll a/b binding proteins (Table  194 S6). 195

Environmental adaptability of naked oat landraces and artificial selection of cultivars 197
Oat is highly adaptable to various climates, including arid and cold regions, and is an excellent 198 species for studying crop abiotic stress tolerance [28,29]. Studying the environmental 199 adaptability of oat can point toward genetic mechanisms and can also provide guidance for the 200 genetic improvement of oat. To study the environmental adaptation mechanisms of naked oats, 201 we used two clades of ONL on the phylogenetic tree: one for accessions from arid and low-202 temperature regions of China, including Gansu, Ningxia, and Qinghai (Group GNQ), and one 203 for accessions from the Yunnan, Sichuan, and Guizhou (Group YG), three provinces with 204 relatively higher rainfall and temperature (Tables S7). 205 The annual rainfall, average temperature, frost-free period, and accumulated temperature of 206 YG were significantly higher than that of GNQ by Student's t-test: the annual rainfall (1203. annotations related to drought resistance, cold acclimation, and DNA damage repair (Table S8  214 and S9). 215 Naked oat breeding programs have successfully improved cultivars' yield and lodging 216 resistance [16,25]. We investigated yield-related agronomic traits, including spikelet number, 217 grain number per spike, spikelet length, panicle length, and grain weight per spike for OH, 218 ONL, and ONC (Table S10). We found that ONC was significantly improved over ONL among 219 all these phenotypes (P < 0.01, Student's t-test). In particular, spikelet number (Figure 5f Seeking to identify genes that were selected during the improvement of ONC, we performed a 223 selective sweep analysis using ONLc as a reference (Figure 5g). We predicted a total of 7,667 224 selective sweep signatures. The set of genes within the putatively selected regions was enriched 225 for annotations, including "carbohydrate mediated signaling", "sugar mediated signaling 226 pathway", and "regulation of developmental vegetative growth" (Table S11 and S12). Some 227 of the selected genes may be related to cultivars' improved yield and lodging resistance.  (Table S13). Many yield-related genes were among the 232 candidate regions, such as AP2 (13 genes were annotated as AP2, Table S12). APETALA 2 233 (AP2)-like family plays an essential role in inflorescence and spikelet development [32]. These 234 candidate genes can provide helpful information for future oat improvement. 235

Association analysis between smut resistance of oat and days to maturity 236
Oat smut is a major oat disease caused by fungal pathogens of the family Ustilaginaceae [33]; 237 it affects the heading stage and seriously reduces oat yields [34]. To enable an association 238 analysis, we downloaded phenotypic data of the 115 accessions (Table S2)  and three-year averages were taken. Using the detected 52,817,822 SNPs, we performed a 242 genome-wide association study (GWAS) and identified a significant association signal on oat 243 chromosome 2D (Fig. 1A). Five predicted ORFs contained significant SNPs, or were located 244 upstream and downstream of significant SNPs. A gene (Pepsico2_Contig5200) positioned 43 245 Kb from the most significant SNP was annotated as Zealexin A1 synthase (Figure 1b). In maize, 246 CYP71Z18 catalyzes the formation of maize phytoalexins, including zealexin A1. We found that the genetic diversity of naked oat was higher than that of hulled oat (1.23e -3 vs. 278 1.12e -3 ). This is contrary to previous reports [2,7] and could reflect the low number of markers 279 used in those studies (8,675 haplotype markers). We found considerable differences in genetic 280 diversity in different regions of chromosomes, such as ONLc, the mean π of chromosomes with 281 the largest and smallest genetic diversity were 0.47e -3 and 2.45e -3 (Figure 2a), respectively. 282 Therefore, a small number of markers may lead to sampling bias. 283 Moreover, compared with naked oat, hulled oat had stronger short-distance linkage 284 disequilibrium. These pieces of evidence suggested that hulled oat experienced a more severe 285 bottleneck than naked oat, which contradicts the previously reported speculation that naked oat 286 originated as a variant of hulled oat [7]. If naked oat is a variant of hulled oat, naked oat would 287 experience an additional bottleneck after the domestication bottleneck shared with hulled oat. 288 By calculating the divergence time of hulled oat and naked oat, we estimate that these two oat 289 types differentiated ~51,200 years ago, much earlier than the estimated domestication time of 290 ~3000 years ago for oat [10]. Distinct from multiple previous proposals [2,7], these lines of 291 evidence from our study suggest that hulled oat and naked oat were domesticated 292

independently. 293
Through the efforts of breeders in recent decades, the yield of naked oat has been dramatically 294 improved [16,25]. The way to achieve this includes a continuous artificial selection of 295 landraces and cross-breeding with hulled oat. We found genetic evidence for both ways.

Linkage disequilibrium decay 370
To estimate and compare linkage disequilibrium (LD) decay patterns, we used PopLDdecay 371 (RRID:SCR_022509) [53] to calculate the mean squared correlation coefficient (r 2 ) values of 372 all SNP pairs within 1 Mb. A bin size of 500 bp was used to generate the LD decay plot. 373

Divergence time estimation 374
To estimate the divergence time between hulled oat and naked oat, we identified all 6,814,874 375 four-fold degenerate loci (4dtv) according to the gene models of the oat reference genome gene 376 annotations. These loci from three high-depth sequencing accessions (R86 from naked oat, 377 R148 from hulled oat, and A. fatua) were then genotyped using BCFtools (RRID:SCR_005227) 378 [46]. To obtain high-confidence genotypes, loci with QUAL less than 60 or GQ less than 30 in 379 any accession were filtered out. Finally, 38,028 loci were left to estimate the divergence time. whole genome association study (GWAS). Bonferroni correction was used to control the false 416 discovery rate (FDR) for multiple testing, with a significant level of 0.05 (α = 0.05). Linkage 417 disequilibrium blocks were detected and visualized using PopLDdecay [53].